110 research outputs found

    Vectors of Locally Aggregated Centers for Compact Video Representation

    Full text link
    We propose a novel vector aggregation technique for compact video representation, with application in accurate similarity detection within large video datasets. The current state-of-the-art in visual search is formed by the vector of locally aggregated descriptors (VLAD) of Jegou et. al. VLAD generates compact video representations based on scale-invariant feature transform (SIFT) vectors (extracted per frame) and local feature centers computed over a training set. With the aim to increase robustness to visual distortions, we propose a new approach that operates at a coarser level in the feature representation. We create vectors of locally aggregated centers (VLAC) by first clustering SIFT features to obtain local feature centers (LFCs) and then encoding the latter with respect to given centers of local feature centers (CLFCs), extracted from a training set. The sum-of-differences between the LFCs and the CLFCs are aggregated to generate an extremely-compact video description used for accurate video segment similarity detection. Experimentation using a video dataset, comprising more than 1000 minutes of content from the Open Video Project, shows that VLAC obtains substantial gains in terms of mean Average Precision (mAP) against VLAD and the hyper-pooling method of Douze et. al., under the same compaction factor and the same set of distortions.Comment: Proc. IEEE International Conference on Multimedia and Expo, ICME 2015, Torino, Ital

    Leaping Into Memories: Space-Time Deep Feature Synthesis

    Full text link
    The success of deep learning models has led to their adaptation and adoption by prominent video understanding methods. The majority of these approaches encode features in a joint space-time modality for which the inner workings and learned representations are difficult to visually interpret. We propose LEArned Preconscious Synthesis (LEAPS), an architecture-agnostic method for synthesizing videos from the internal spatiotemporal representations of models. Using a stimulus video and a target class, we prime a fixed space-time model and iteratively optimize a video initialized with random noise. We incorporate additional regularizers to improve the feature diversity of the synthesized videos as well as the cross-frame temporal coherence of motions. We quantitatively and qualitatively evaluate the applicability of LEAPS by inverting a range of spatiotemporal convolutional and attention-based architectures trained on Kinetics-400, which to the best of our knowledge has not been previously accomplished

    Uni-NLX: Unifying Textual Explanations for Vision and Vision-Language Tasks

    Full text link
    Natural Language Explanations (NLE) aim at supplementing the prediction of a model with human-friendly natural text. Existing NLE approaches involve training separate models for each downstream task. In this work, we propose Uni-NLX, a unified framework that consolidates all NLE tasks into a single and compact multi-task model using a unified training objective of text generation. Additionally, we introduce two new NLE datasets: 1) ImageNetX, a dataset of 144K samples for explaining ImageNet categories, and 2) VQA-ParaX, a dataset of 123K samples for explaining the task of Visual Question Answering (VQA). Both datasets are derived leveraging large language models (LLMs). By training on the 1M combined NLE samples, our single unified framework is capable of simultaneously performing seven NLE tasks including VQA, visual recognition and visual reasoning tasks with 7X fewer parameters, demonstrating comparable performance to the independent task-specific models in previous approaches, and in certain tasks even outperforming them. Code is at https://github.com/fawazsammani/uni-nlxComment: Accepted to ICCVW 202

    Fast Desynchronization For Decentralized Multichannel Medium Access Control

    Get PDF
    Distributed desynchronization algorithms are key to wireless sensor networks as they allow for medium access control in a decentralized manner. In this paper, we view desynchronization primitives as iterative methods that solve optimization problems. In particular, by formalizing a well established desynchronization algorithm as a gradient descent method, we establish novel upper bounds on the number of iterations required to reach convergence. Moreover, by using Nesterov's accelerated gradient method, we propose a novel desynchronization primitive that provides for faster convergence to the steady state. Importantly, we propose a novel algorithm that leads to decentralized time-synchronous multichannel TDMA coordination by formulating this task as an optimization problem. Our simulations and experiments on a densely-connected IEEE 802.15.4-based wireless sensor network demonstrate that our scheme provides for faster convergence to the steady state, robustness to hidden nodes, higher network throughput and comparable power dissipation with respect to the recently standardized IEEE 802.15.4e-2012 time-synchronized channel hopping (TSCH) scheme.Comment: to appear in IEEE Transactions on Communication

    Convergence of Desynchronization Primitives in Wireless Sensor Networks: A Stochastic Modeling Approach

    Get PDF
    Desynchronization approaches in wireless sensor networks converge to time-division multiple access (TDMA) of the shared medium without requiring clock synchronization amongst the wireless sensors, or indeed the presence of a central (coordinator) node. All such methods are based on the principle of reactive listening of periodic "fire" or "pulse" broadcasts: each node updates the time of its fire message broadcasts based on received fire messages from some of the remaining nodes sharing the given spectrum. In this paper, we present a novel framework to estimate the required iterations for convergence to fair TDMA scheduling. Our estimates are fundamentally different from previous conjectures or bounds found in the literature as, for the first time, convergence to TDMA is defined in a stochastic sense. Our analytic results apply to the Desync algorithm and to pulse-coupled oscillator algorithms with inhibitory coupling. The experimental evaluation via iMote2 TinyOS nodes (based on the IEEE 802.15.4 standard) as well as via computer simulations demonstrates that, for the vast majority of settings, our stochastic model is within one standard deviation from the experimentally-observed convergence iterations. The proposed estimates are thus shown to characterize the desynchronization convergence iterations significantly better than existing conjectures or bounds. Therefore, they contribute towards the analytic understanding of how a desynchronization-based system is expected to evolve from random initial conditions to the desynchronized steady state.Comment: to appear, IEEE Transactions on Signal Processing, 201

    Visualizing and Understanding Contrastive Learning

    Full text link
    Contrastive learning has revolutionized the field of computer vision, learning rich representations from unlabeled data, which generalize well to diverse vision tasks. Consequently, it has become increasingly important to explain these approaches and understand their inner workings mechanisms. Given that contrastive models are trained with interdependent and interacting inputs and aim to learn invariance through data augmentation, the existing methods for explaining single-image systems (e.g., image classification models) are inadequate as they fail to account for these factors. Additionally, there is a lack of evaluation metrics designed to assess pairs of explanations, and no analytical studies have been conducted to investigate the effectiveness of different techniques used to explaining contrastive learning. In this work, we design visual explanation methods that contribute towards understanding similarity learning tasks from pairs of images. We further adapt existing metrics, used to evaluate visual explanations of image classification systems, to suit pairs of explanations and evaluate our proposed methods with these metrics. Finally, we present a thorough analysis of visual explainability methods for contrastive learning, establish their correlation with downstream tasks and demonstrate the potential of our approaches to investigate their merits and drawbacks
    • …
    corecore